I am Emily Josephs
- interested in: evolutionary genetics, plants, triathlons, cat
October 19, 2023
I am Emily Josephs
I am Emily Josephs
Data into spreadsheet
Wrangle data into R
Visualize data
Build a model
Use algorithm to parameterize model
Interpret results
Data into spreadsheet
Wrangle data into R
Visualize data
Build a model
Use algorithm to parameterize model
Interpret results
Still using R, but focus won’t be on the language itself!
This part of the course may feel like it moves a little faster.
There will be math!
You will be learning a new skill, using a young skill, which is hard!
The material will build on itself.
I am learning along with you!!!
ALSO:
Everything else in the world is happening!!!
take a deep breath!
believe in yourself!!!
remember that your primary goal is present understanding, but a solid secondary goal is future understanding.
try to stay on top of the work.
don’t be afraid of office hours!!
Change from syllabus:
Office hours are Mondays 2-4pm in PLB 266 or on zoom.
Zoom Room: https://msu.zoom.us/my/emjos (Password: plantsrule)
Many of you got into this business because of a love of nature, not a love of stats and computers.
Our goal is to try to translate your love of nature into a love of patterns, and give you the tools to do so!
Biology has become a driver of quantitative methods in STEM because interesting patterns are sublte and complicated.
Also, coding and stats is a very useful skill.
Simulations?
One of the most powerful tools that we’ll have in our statistics learning toolkit are simulations.
Simulations let you generate data that you know should look a certain way, so you can test your intuitions.
Simulations also let you do the same thing over and over and over.
thisClass = 1:35 thisClass
## [1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 ## [26] 26 27 28 29 30 31 32 33 34 35
sample(thisClass, size=1)
## [1] 31
sample(thisClass, size=1)
## [1] 15
sample(thisClass, size=1)
## [1] 14
How do you define probability?
A measure of the likelihood that an event will occur
A long-term frequency
How often we expect an event to happen (‘degree of belief’)
Will I get to Park Place?
Which definition matches the probability of landing on Park Place?
When will I give birth?
Which definition matches the probability of giving birth on a specific day?
Who will win the election?
Which definition matches the probability of an election outcome?
The probability of an outcome is the number of times the outcome occurs divided by the total number of trials.
A simple event consists of a single experiment and has a single outcome
The sum of the probabilities of all possible outcomes of an event is equal to 1.
The probability of a shared event is the product of the probabilities of its constitutive events, so long as they’re independent.
The probability of a complex event is the sum of the probabilities of its constitutive events.
The sum of the probabilities of all possible outcomes of an event is equal to 1.
A set is an unordered set of objects.
S = {1,2,3,4,5}
S = {1,2,3,4,5} = {4,3,2,5,1}
The intersection of two sets is all the elements that appear in both sets.
{1,2,3} \(\cap\) {2,3,4} = {2,3}
The union of two sets is every element that appears in either set.
{1,2,3} \(\cup\) {2,3,4} = {1,2,3,4}
I have a set {1,2,3,4,5}
What’s the probability that, if I draw a number at random, I’ll get a 5?
This is a simple event -
help(sample)
numbers <- c(1,2,3,4,5) sample(numbers,1)
## [1] 3
sample(numbers,1)
## [1] 2
sample(numbers,1)
## [1] 2
numbers <- c(1,2,3,4,5) sample(numbers,1)
Working in teams, figure out a way to use a large-scale simulation (1e5 trials) to estimate the probability of picking a 5 in the following set:
{1,2,3,4,5}
sample()!numbers <- 1:5 nDraws <- 1e5 mySamples <- sample(numbers,nDraws,replace=TRUE)
length(which(mySamples==5))/nDraws
## [1] 0.201
1/5
## [1] 0.2
We denote the probability of an event with P().
So, we can write the probability of a simple event e as:
\(\Large P(e)\)
as in:
\(\Large P(\text{drawing a }5) = \frac{1}{5}\)
\(\Large p(\text{drawing a }5) = \frac{1}{5}\)
What’s the probability of drawing a 4 out of {1,2,3,4,5}?
\(\Large P(4) = P(5) = P(1) = P(2) = P(3) = \frac{1}{5}\)
\(\Large \sum\limits_{i=1}^n P(e_{i}) = 1\)
The sum of the probabilities of all the outcomes of an event is 1
\(\Large P(4) + P(5) + P(1) + P(2) + P(3) = 1\)
The probability of an outcome is the number of times the outcome occurs divided by the total number of trials.
A simple event consists of a single experiment and has a single outcome
The sum of the probabilities of all possible outcomes of an event is equal to 1.
A shared event is the simultaneous occurrence of simple events
The probability of a shared event is denoted P(A \(\cap\) B)
(read as probability of the intersection of simple events A and B)
E.g., probability of drawing 2 numbers and getting a 4 AND a 5
the probability of a shared event is the product of the probabilities of its constitutive simple events
[As long as the simple events are independent, meaning that the outcome of one event does not depend on the outcome of another.]
numbers <- 1:5
What’s the probability you draw the sequence (4,5), in that order?
nDraws <- 1e5
## sample two dice rolls for nDraws times.
two.draws <- replicate(nDraws,
sample(numbers,2,replace=TRUE),
simplify=FALSE)
## go through list, which ones are equal to 2
is45 <- lapply(two.draws, function(x){
sum(x==c(4,5))==2
})
#count them
prop45 <- sum(unlist(is45))/nDraws
prop45
## [1] 0.0402
the probability of a shared event is the product of the probabilities of the simple events that make up the shared event:
\(\Large p(4 \text{ in first draw}) = \frac{1}{5}\) \(\Large p(5 \text{ in second draw}) = \frac{1}{5}\) \(\Large p(\text{sequence } (4,5) ) = \frac{1}{5} \times \frac{1}{5} = 0.04\)
The probability of an outcome is the number of times the outcome occurs divided by the total number of trials.
A simple event consists of a single experiment and has a single outcome
The sum of the probabilities of all possible outcomes of an event is equal to 1.
The probability of a shared event is the product of the probabilities of its constitutive events, so long as they’re independent.
The probability of a complex event is the sum of the probabilities of its constitutive events.
Complex events are composites of simple events
e.g., rolling two dice and summing their values
The probability of a complex event is the sum of the probabilities of the simple events that make up the complex event
numbers <- c(1,2,3,4,5)
If I draw two numbers at once, what’s the probability that they sum to 6?
Return to teams and update your simulations to test this.
two.draws <- numeric(1e5)
for(i in 1:1e5){
two.draws[i] <- sum(
sample(1:5,2,replace=TRUE)
)
}
length(which(two.draws==6))/1e5
## [1] 0.2
numbers <- c(1,2,3,4,5)
If I draw two numbers at once, what’s the probability that they sum to 6?
two.draws <- replicate(1e5,
sum(
sample(1:5,2,replace=TRUE)
)
)
length(which(two.draws==6))/1e5
## [1] 0.201
If I draw two numbers at once, what’s the probability that they sum to 6?
The total state space is 5 \(\times\) 5 outcomes, each of which is equi-probable (1/25).
And can get a 6 with 5 outcomes:
{(1,5),(2,4),(3,3),(4,2),(5,1)}
So the probability of drawing two numbers that sum to 6 is the sum of the probabilities of the simple events that make up that complex event.
\(\large P(a + b = 6) = \frac{1}{25} + \frac{1}{25} + \frac{1}{25} + \frac{1}{25} + \frac{1}{25} = \frac{1}{5}\)
In general,
\(\large P(E_{1} + E_{2}) = P(E_{1}) + P(E_{2})\)
If \(E_{1}\) and \(E_{2}\) are mutually exclusive events
(This is also an axiom)
The probability of an outcome is the number of times the outcome occurs divided by the total number of trials.
A simple event consists of a single experiment and has a single outcome
The sum of the probabilities of all possible outcomes of an event is equal to 1.
The probability of a shared event is the product of the probabilities of its constitutive events, so long as they’re independent.
The probability of a complex event is the sum of the probabilities of its constitutive events.
You record the hair color and shoe phenotype of all the kids in a daycare.
| plain | sparkly | |
|---|---|---|
| black hair | 48 | 10 |
| brown hair | 35 | 7 |
If you pick a kid at random, what is the probability they have black hair and sparkly shoes?
| plain | sparkly | |
|---|---|---|
| black hair | 48 | 10 |
| brown hair | 35 | 7 |
If you pick a kid at random, what is the probability they have black hair and sparkly shoes?
0.1
| plain | sparkly | |
|---|---|---|
| black hair | 48 | 10 |
| brown hair | 35 | 7 |
What’s the probability of having black hair?
Note that there are now two categories of kids with black hair: the plain-shoed and the sparkle-shoed.
| plain | sparkly | |
|---|---|---|
| black hair | 48 | 10 |
| brown hair | 35 | 7 |
What’s the probability of having black hair?
p(black hair) = p(black hair AND plain shoes) + p(black hair AND sparkly shoes) = 0.58
What’s the probability of picking two kids that both have black hair and sparkly shoes?
| plain | sparkly | |
|---|---|---|
| black hair | 48 | 10 |
| brown hair | 35 | 7 |
p(black hair and sparkly shoes) x p(black hair and sparkly shoes) = 0.01
The probability of an outcome is the number of times the outcome occurs divided by the total number of trials.
The probability of a complex event is the sum of the probabilities of its constitutive events.
The probability of a shared event is the product of the probabilities of its constitutive events, so long as they’re independent.
The sum of the probabilities of all possible outcomes of an event is equal to 1.
| plain | sparkly | total | |
|---|---|---|---|
| black hair | 48 | 10 | 58 |
| brown hair | 35 | 7 | 42 |
| total | 83 | 17 | 100 |
- probability of sparkly shoes IF you have black hair?
a conditional probability is the probability of one outcome conditional on another
written as p(A | B), read as “probability of A given B”
probability of the intersection of A and B, divided by the probability of B
\(\Huge p(A|B) = \frac{p(A ~ \cap ~ B)}{p(B)}\)
\(\Huge p(A|B) = \frac{p(A ~ \cap ~ B)}{p(B)}\)
| plain | sparkly | total | |
|---|---|---|---|
| black hair | 48 | 10 | 58 |
| brown hair | 35 | 7 | 42 |
| total | 83 | 17 | 100 |
- probability of sparkly shoes IF you have black hair?
| plain | sparkly | total | |
|---|---|---|---|
| black hair | 48 | 10 | 58 |
| brown hair | 35 | 7 | 42 |
| total | 83 | 17 | 100 |
- probability of sparkly shoes IF you have black hair?
-p(A | B) = p(A \(\cap\) B)/p(B) =
| plain | sparkly | total | |
|---|---|---|---|
| black hair | 48 | 10 | 58 |
| brown hair | 35 | 7 | 42 |
| total | 83 | 17 | 100 |
- probability of sparkly shoes IF you have black hair?
-p(A | B) = p(A \(\cap\) B)/p(B) = 10 / 58 = 0.172
recall that if A and B are independent, p( A \(\cap\) B) = p(A) * p(B)
so if A and B are independent:
\(\Large \begin{aligned} p(A \mid B) &= \frac{p(A ~ \cap ~ B)}{p(B)} \\ \\ &= \frac{p(A)p(B)}{p(B)} = p(A) \end{aligned}\)
| plain | sparkly | total | |
|---|---|---|---|
| black hair | 48 | 10 | 58 |
| brown hair | 35 | 7 | 42 |
| total | 83 | 17 | 100 |
Does a kid’s hair color affect the probability that their parents will get them a sweet pair of sparkly shoes?
p(sparkle shoes | black hair) = 0.172
p(sparkle shoes) = 0.17
The probability of an outcome is the number of times the outcome occurs divided by the total number of trials.
The probability of a complex event is the sum of the probabilities of its constitutive events.
The probability of a shared event is the product of the probabilities of its constitutive events, so long as they’re independent.
The sum of the probabilities of all possible outcomes of an event is equal to 1.
A conditional probability is probability of one outcome conditional on another, and is equal to the probability of the intersection of the outcomes, divided by the probability of the condition
You roll a pair of 4-sided dice 100 times in a row and record the combination of numbers you get (1st roll in row, 2nd roll in col):
| 1 | 2 | 3 | 4 | |
|---|---|---|---|---|
| 1 | 7 | 6 | 8 | 4 |
| 2 | 5 | 4 | 4 | 8 |
| 3 | 7 | 9 | 8 | 6 |
| 4 | 9 | 5 | 6 | 4 |
POLL - p(the first roll is a “1”)?
A - 7/100
B - (7+6+8+4)/100
C - (7+5+7+9)/100
D - 7/(7+5+7+9)
You roll a pair of 4-sided dice 100 times in a row and record the combination of numbers you get (1st roll in row, 2nd roll in col):
| 1 | 2 | 3 | 4 | |
|---|---|---|---|---|
| 1 | 7 | 6 | 8 | 4 |
| 2 | 5 | 4 | 4 | 8 |
| 3 | 7 | 9 | 8 | 6 |
| 4 | 9 | 5 | 6 | 4 |
POLL - p(the first roll is a “1”)?
A - 7/100
B - (7+6+8+4)/100
C - (7+5+7+9)/100
D - 7/(7+5+7+9)
You roll a pair of 4-sided dice 100 times in a row and record the combination of numbers you get (1st roll in row, 2nd roll in col):
| 1 | 2 | 3 | 4 | |
|---|---|---|---|---|
| 1 | 7 | 6 | 8 | 4 |
| 2 | 5 | 4 | 4 | 8 |
| 3 | 7 | 9 | 8 | 6 |
| 4 | 9 | 5 | 6 | 4 |
POLL - Expected p(rolling (2,2))?
A - \(\frac{1}{4} \times \frac{1}{4}\)
B - \(\frac{1}{4} + \frac{1}{4}\)
C - \({(\frac{1}{4}})^{4}\)
You roll a pair of 4-sided dice 100 times in a row and record the combination of numbers you get (1st roll in row, 2nd roll in col):
| 1 | 2 | 3 | 4 | |
|---|---|---|---|---|
| 1 | 7 | 6 | 8 | 4 |
| 2 | 5 | 4 | 4 | 8 |
| 3 | 7 | 9 | 8 | 6 |
| 4 | 9 | 5 | 6 | 4 |
POLL - Expected p(rolling (2,2))?
A - \(\frac{1}{4} \times \frac{1}{4}\)
B - \(\frac{1}{4} + \frac{1}{4}\)
C - \({(\frac{1}{4}})^{4}\)
You roll a pair of 4-sided dice 100 times in a row and record the combination of numbers you get (1st roll in row, 2nd roll in col):
| 1 | 2 | 3 | 4 | |
|---|---|---|---|---|
| 1 | 7 | 6 | 8 | 4 |
| 2 | 5 | 4 | 4 | 8 |
| 3 | 7 | 9 | 8 | 6 |
| 4 | 9 | 5 | 6 | 4 |
POLL - Expected p(sum of both rolls is greater than 3)?
A - \(\frac{1}{4} \times \frac{1}{4} \times \frac{13}{16}\)
B - \(\frac{13}{16}\)
C - \(\frac{1}{2}\)
You roll a pair of 4-sided dice 100 times in a row and record the combination of numbers you get (1st roll in row, 2nd roll in col):
| 1 | 2 | 3 | 4 | |
|---|---|---|---|---|
| 1 | 7 | 6 | 8 | 4 |
| 2 | 5 | 4 | 4 | 8 |
| 3 | 7 | 9 | 8 | 6 |
| 4 | 9 | 5 | 6 | 4 |
POLL - Expected p(sum of both rolls is greater than 3)?
A - \(\frac{1}{4} \times \frac{1}{4} \times \frac{13}{16}\)
B - \(\frac{13}{16}\)
C - \(\frac{1}{2}\)
You roll a pair of 4-sided dice 100 times in a row and record the combination of numbers you get (1st roll in row, 2nd roll in col):
| 1 | 2 | 3 | 4 | |
|---|---|---|---|---|
| 1 | 7 | 6 | 8 | 4 |
| 2 | 5 | 4 | 4 | 8 |
| 3 | 7 | 9 | 8 | 6 |
| 4 | 9 | 5 | 6 | 4 |
POLL - Expected p(sum of both rolls > 3 given first roll is a 1)?
A - \(\frac{1}{4} \times \frac{1}{4}\)
B - \(\frac{2}{16}\)
C - \(\frac{1}{2}\)
You roll a pair of 4-sided dice 100 times in a row and record the combination of numbers you get (1st roll in row, 2nd roll in col):
| 1 | 2 | 3 | 4 | |
|---|---|---|---|---|
| 1 | 7 | 6 | 8 | 4 |
| 2 | 5 | 4 | 4 | 8 |
| 3 | 7 | 9 | 8 | 6 |
| 4 | 9 | 5 | 6 | 4 |
POLL - Expected p(sum of both rolls > 3 given first roll is a 1)?
A - \(\frac{1}{4} \times \frac{1}{4}\)
B - \(\frac{2}{16}\)
C - \(\frac{1}{2}\)